Introduction

The goal of this notebook is to validate the best model identified in the previous work. Here, we follow two different applications:

  • To balance on the other covariates (e.g., environment and performance metrics), then look at the difference in the user engagement metrics between the balanced Beta and Release for that version (N). This gives us an idea of how clients with similar environments and performance resemble Release in terms of usage.
  • To balance the Beta and Release datasets to resemble each other across the covariates we are concerned with. Balancing, in this case, yields a set of client_id for Beta that resembles Release. Our application is then querying the current Beta data (Version N+1) for this client_id, and then calculate the metrics we care about from the covariates we care about. This is our outcome.

Data Preparation

Training

rows columns discrete_columns continuous_columns all_missing_columns total_missing_values complete_rows total_observations memory_usage
302819 95 7 88 0 0 302819 28767805 174448528

First Application - Training Daset (V67)

In this application, we need to balance the two groups (Beta and Release) considering the other covariates (e.g., environment and performance metrics) and then look at the difference in user engagement metrics between the balanced Beta and Release for that version (N). The utility of this application is to inform us on how Beta is different concerning Release in user engagement, with all the other covariates being equal.

Modeling

Setting the selected expirement from previous work.

Match using Nearest Neighbor matching - Mahalanobis: Full Dataset

The best model from previous work.

## 
## Call:
## matchit(formula = generate_formula(covariates, label), data = df_train_1x, 
##     method = "nearest", distance = "mahalanobis")
## 
## Summary of balance for all data:
##                                   Means Treated Means Control SD Control
## daily_num_sessions_started               2.8892        2.3689     2.7099
## daily_num_sessions_started_max           5.2594        4.2814     4.8046
## FX_PAGE_LOAD_MS_2_PARENT              3032.6864     3463.7084  1920.1256
## memory_mb                             9451.9965     8965.1561  7925.6741
## num_active_days                          5.5720        5.3462     2.2644
## num_addons                               5.6508        7.8554     3.3349
## num_bookmarks                          159.0916      242.4878  1292.6145
## profile_age                            891.6771      893.7534   762.4791
## session_length                           9.2780       12.2962    14.7435
## session_length_max                      18.2733       22.7066    30.2272
## TIME_TO_DOM_COMPLETE_MS               3296.7350     4388.9143  4254.1363
## TIME_TO_DOM_CONTENT_LOADED_END_MS     2296.4590     2737.6381  2700.5353
## TIME_TO_DOM_INTERACTIVE_MS            1802.1151     2404.3932  2397.0177
## TIME_TO_LOAD_EVENT_END_MS             3021.9607     4126.9123  4007.0480
## TIME_TO_NON_BLANK_PAINT_MS            1452.3717     1833.6557  2128.4506
##                                    Mean Diff  eQQ Med  eQQ Mean     eQQ Max
## daily_num_sessions_started            0.5203   0.4000    0.5205      2.0000
## daily_num_sessions_started_max        0.9780   1.0000    0.9782      5.0000
## FX_PAGE_LOAD_MS_2_PARENT           -431.0220 296.6689  431.0921   1248.7189
## memory_mb                           486.8404  30.0000  532.1724 262176.0000
## num_active_days                       0.2259   0.0000    0.2391      1.0000
## num_addons                           -2.2046   2.0000    2.2069    112.6667
## num_bookmarks                       -83.3962   1.0000   83.8425  21946.0000
## profile_age                          -2.0762  22.0000   25.0648   1346.0000
## session_length                       -3.0182   1.3870    3.0201    151.0160
## session_length_max                   -4.4332   2.6072    4.4431    873.3167
## TIME_TO_DOM_COMPLETE_MS           -1092.1793 415.1250 1092.2229  13977.0000
## TIME_TO_DOM_CONTENT_LOADED_END_MS  -441.1792 226.8665  441.2334  10881.3696
## TIME_TO_DOM_INTERACTIVE_MS         -602.2781 241.0401  602.3597  18504.4538
## TIME_TO_LOAD_EVENT_END_MS         -1104.9516 436.0927 1105.0155  13816.3014
## TIME_TO_NON_BLANK_PAINT_MS         -381.2840 156.1499  381.7221  18095.0000
## 
## 
## Summary of balance for matched data:
##                                   Means Treated Means Control SD Control
## daily_num_sessions_started               2.8892        2.3689     2.7099
## daily_num_sessions_started_max           5.2594        4.2814     4.8046
## FX_PAGE_LOAD_MS_2_PARENT              3032.6864     3463.7084  1920.1256
## memory_mb                             9451.9965     8965.1561  7925.6741
## num_active_days                          5.5720        5.3462     2.2644
## num_addons                               5.6508        7.8554     3.3349
## num_bookmarks                          159.0916      242.4878  1292.6145
## profile_age                            891.6771      893.7534   762.4791
## session_length                           9.2780       12.2962    14.7435
## session_length_max                      18.2733       22.7066    30.2272
## TIME_TO_DOM_COMPLETE_MS               3296.7350     4388.9143  4254.1363
## TIME_TO_DOM_CONTENT_LOADED_END_MS     2296.4590     2737.6381  2700.5353
## TIME_TO_DOM_INTERACTIVE_MS            1802.1151     2404.3932  2397.0177
## TIME_TO_LOAD_EVENT_END_MS             3021.9607     4126.9123  4007.0480
## TIME_TO_NON_BLANK_PAINT_MS            1452.3717     1833.6557  2128.4506
##                                    Mean Diff  eQQ Med  eQQ Mean     eQQ Max
## daily_num_sessions_started            0.5203   0.4000    0.5205      2.0000
## daily_num_sessions_started_max        0.9780   1.0000    0.9782      5.0000
## FX_PAGE_LOAD_MS_2_PARENT           -431.0220 296.6689  431.0921   1248.7189
## memory_mb                           486.8404  30.0000  532.1724 262176.0000
## num_active_days                       0.2259   0.0000    0.2391      1.0000
## num_addons                           -2.2046   2.0000    2.2069    112.6667
## num_bookmarks                       -83.3962   1.0000   83.8425  21946.0000
## profile_age                          -2.0762  22.0000   25.0648   1346.0000
## session_length                       -3.0182   1.3870    3.0201    151.0160
## session_length_max                   -4.4332   2.6072    4.4431    873.3167
## TIME_TO_DOM_COMPLETE_MS           -1092.1793 415.1250 1092.2229  13977.0000
## TIME_TO_DOM_CONTENT_LOADED_END_MS  -441.1792 226.8665  441.2334  10881.3696
## TIME_TO_DOM_INTERACTIVE_MS         -602.2781 241.0401  602.3597  18504.4538
## TIME_TO_LOAD_EVENT_END_MS         -1104.9516 436.0927 1105.0155  13816.3014
## TIME_TO_NON_BLANK_PAINT_MS         -381.2840 156.1499  381.7221  18095.0000
## 
## Percent Balance Improvement:
##                                   Mean Diff. eQQ Med eQQ Mean eQQ Max
## daily_num_sessions_started                 0       0        0       0
## daily_num_sessions_started_max             0       0        0       0
## FX_PAGE_LOAD_MS_2_PARENT                   0       0        0       0
## memory_mb                                  0       0        0       0
## num_active_days                            0       0        0       0
## num_addons                                 0       0        0       0
## num_bookmarks                              0       0        0       0
## profile_age                                0       0        0       0
## session_length                             0       0        0       0
## session_length_max                         0       0        0       0
## TIME_TO_DOM_COMPLETE_MS                    0       0        0       0
## TIME_TO_DOM_CONTENT_LOADED_END_MS          0       0        0       0
## TIME_TO_DOM_INTERACTIVE_MS                 0       0        0       0
## TIME_TO_LOAD_EVENT_END_MS                  0       0        0       0
## TIME_TO_NON_BLANK_PAINT_MS                 0       0        0       0
## 
## Sample sizes:
##           Control Treated
## All         59627   59627
## Matched     59627   59627
## Unmatched       0       0
## Discarded       0       0
##                                                Stratified by is_release
##                                                 FALSE            
##   n                                               59627          
##   daily_num_sessions_started (mean (SD))           2.37 (2.71)   
##   daily_num_sessions_started_max (mean (SD))       4.28 (4.80)   
##   FX_PAGE_LOAD_MS_2_PARENT (mean (SD))          3463.71 (1920.13)
##   memory_mb (mean (SD))                         8965.16 (7925.67)
##   num_active_days (mean (SD))                      5.35 (2.26)   
##   num_addons (mean (SD))                           7.86 (3.33)   
##   num_bookmarks (mean (SD))                      242.49 (1292.61)
##   profile_age (mean (SD))                        893.75 (762.48) 
##   session_length (mean (SD))                      12.30 (14.74)  
##   session_length_max (mean (SD))                  22.71 (30.23)  
##   TIME_TO_DOM_COMPLETE_MS (mean (SD))           4388.91 (4254.14)
##   TIME_TO_DOM_CONTENT_LOADED_END_MS (mean (SD)) 2737.64 (2700.54)
##   TIME_TO_DOM_INTERACTIVE_MS (mean (SD))        2404.39 (2397.02)
##   TIME_TO_LOAD_EVENT_END_MS (mean (SD))         4126.91 (4007.05)
##   TIME_TO_NON_BLANK_PAINT_MS (mean (SD))        1833.66 (2128.45)
##                                                Stratified by is_release
##                                                 TRUE              SMD   
##   n                                               59627                 
##   daily_num_sessions_started (mean (SD))           2.89 (2.98)     0.183
##   daily_num_sessions_started_max (mean (SD))       5.26 (5.38)     0.192
##   FX_PAGE_LOAD_MS_2_PARENT (mean (SD))          3032.69 (1582.59)  0.245
##   memory_mb (mean (SD))                         9452.00 (8755.30)  0.058
##   num_active_days (mean (SD))                      5.57 (2.06)     0.104
##   num_addons (mean (SD))                           5.65 (2.23)     0.777
##   num_bookmarks (mean (SD))                      159.09 (668.92)   0.081
##   profile_age (mean (SD))                        891.68 (766.17)   0.003
##   session_length (mean (SD))                       9.28 (9.46)     0.244
##   session_length_max (mean (SD))                  18.27 (19.75)    0.174
##   TIME_TO_DOM_COMPLETE_MS (mean (SD))           3296.73 (2709.40)  0.306
##   TIME_TO_DOM_CONTENT_LOADED_END_MS (mean (SD)) 2296.46 (2242.23)  0.178
##   TIME_TO_DOM_INTERACTIVE_MS (mean (SD))        1802.12 (1521.63)  0.300
##   TIME_TO_LOAD_EVENT_END_MS (mean (SD))         3021.96 (2458.11)  0.332
##   TIME_TO_NON_BLANK_PAINT_MS (mean (SD))        1452.37 (1470.00)  0.208


Observations

  • Both table and plot show that the matching process successfully match all instances. That is, we have no unmatched (unadjusted) samples.
  • The plot shows that, for adjusted cases (after matching), the standardized mean difference is relatively small for most covariates. The smallest (that is, less than or equal to \(0.1\)) are: profile_age, memory_mb, num_bookmarks and num_active_days. However, some have a high absolute value, for example, num_addons.

Post-matching Beta-Release Difference:

active_hours active_hours_max uri_count uri_count_max search_count search_count_max num_pages num_pages_max daily_max_tabs daily_max_tabs_max daily_unique_domains daily_unique_domains_max daily_tabs_opened daily_tabs_opened_max
beta (mean) 0.8236611 1.5775076 152.7454962 311.0212991 2.4506498 5.6361715 1.736346e+04 1.755893e+04 9.6036282 13.8114948 5.0604637 8.7436101 20.4919083 39.6478609
release (mean) 0.8453998 1.6260522 155.9369580 320.9798749 2.3637896 5.4224764 1.717894e+04 1.737081e+04 6.1426896 9.2162611 4.9503575 8.5124946 17.0614346 33.1348382
delta (mean) 0.0257141 0.0298542 0.0204664 0.0310255 0.0367461 0.0394091 1.074100e-02 1.082960e-02 0.5634240 0.4986006 0.0222421 0.0271502 0.2010660 0.1965612
beta (median) 0.5309524 1.0638889 86.6666667 172.0000000 0.8333333 2.0000000 4.185667e+03 4.340000e+03 4.2500000 6.0000000 3.5625000 5.5000000 9.0000000 17.0000000
release (median) 0.5726852 1.1500000 96.4000000 196.0000000 0.8750000 2.0000000 5.396125e+03 5.576000e+03 3.7142857 6.0000000 3.5873016 5.8750000 8.7500000 17.0000000
delta (median) 0.0728722 0.0748792 0.1009682 0.1224490 0.0476190 0.0000000 2.243199e-01 2.216643e-01 0.1442308 0.0000000 0.0069137 0.0638298 0.0285714 0.0000000

Observations

  • Analysing only the Q-Q plots, the best matching variables are near the \(x = y\) line (e.g., active_hours, num_pages, search_count and daily_unique_domains). However, some plots present a deviation from such a line at the high-end (e.g., daily_max_tabs and daily_tabs_opened).
  • Regarding violin plots, overall, the Beta users from V67, with all the other covariates (experiment 3) being equal, are quite similar concerning Release in user engagement. Only the following metrics yielded a great deviation (as already pointed by the KS test):

    • num_pages and num_pages_max
    • daily_max_tabs and daily_max_tabs_max

Second Application - Validation dataset (V68)

In this application, we need to balance the Beta and Release datasets to resemble each other across the covariates we are concerned with, that is, the user engagement metrics. Balancing, in this case, yields a set of client_id for Beta that resembles Release. This gives us an idea of how these users do indeed change in time. If we see changes that are larger than anticipated, then we know that something significant is happening in user engagement that we can “forecast” in the subsequent Release.

First, we determine the number of training (v67) Beta and Release clients that are in the validation set (v68).

##     label  freq
## 1    beta 38861
## 2 release 38048

Let’s compare this to existing distribution:

## Percentage of beta mutual clients: 65 %
## Percentage of release mutual clients: 64 %

Hence, most training clients (65%) are in the validation set.

Holdout Covariates

  • User engagement metrics:
    • active_hours
    • active_hours_max
    • uri_count
    • uri_count_max
    • search_count
    • search_count_max
    • num_pages
    • num_pages_max
    • daily_max_tabs
    • daily_max_tabs_max
    • daily_unique_domains
    • daily_unique_domains_max
    • daily_tabs_opened
    • daily_tabs_opened_max

Subset the validation clients down to those matched:

##     label  freq
## 1    beta 38864
## 2 release 10503

Training and Validation Difference:

Mean

active_hours active_hours_max uri_count uri_count_max search_count search_count_max num_pages num_pages_max daily_max_tabs daily_max_tabs_max daily_unique_domains daily_unique_domains_max daily_tabs_opened daily_tabs_opened_max
pre-matching 0.0663253 0.1024447 0.0805868 0.1264968 0.0486806 0.0938861 0.0872624 0.0882048 0.4017325 0.3321342 0.0057734 0.0288264 0.1549385 0.1032004
post-matching 0.0649846 0.0616132 0.0548619 0.0667833 0.0646284 0.0585728 0.0323710 0.0309630 0.6428505 0.5369392 0.1145931 0.1180266 0.1821139 0.1783391

Median

active_hours active_hours_max uri_count uri_count_max search_count search_count_max num_pages num_pages_max daily_max_tabs daily_max_tabs_max daily_unique_domains daily_unique_domains_max daily_tabs_opened daily_tabs_opened_max
pre-matching 0.1274105 0.1750000 0.1757679 0.2361809 0.25 0 0.3747665 0.3661133 0.0855263 0.0000000 0.0425532 0.1333333 0.0555556 0.1176471
post-matching 0.1161720 0.1149194 0.1245614 0.1428571 0.00 0 0.1473045 0.1403023 0.1552795 0.1666667 0.0666667 0.1666667 0.0024213 0.0500000

metric label active_hours active_hours_max uri_count uri_count_max search_count search_count_max num_pages num_pages_max daily_max_tabs daily_max_tabs_max daily_unique_domains daily_unique_domains_max daily_tabs_opened daily_tabs_opened_max
mean beta 1.798844 2.471189 147.33392 288.3003 3.324319 6.100206 15615.038 15780.79 10.019717 13.82845 6.148112 9.581990 21.03166 38.29553
mean beta - matched 1.887442 2.712019 166.10303 342.6800 3.655976 7.157421 21883.033 22093.27 11.283075 15.59544 6.832382 11.218976 22.26366 43.03337
mean release 1.855592 2.639107 160.16014 329.9058 3.443259 6.628659 17107.820 17307.29 7.434692 10.62999 6.118560 9.836721 18.34435 34.80667
median beta 1.502778 1.962500 81.50000 153.0000 1.750000 3.000000 3348.500 3514.00 5.125000 7.00000 4.500000 6.200000 9.50000 16.00000
median beta - matched 1.599653 2.219444 100.80000 205.0000 2.000000 4.000000 7244.364 7452.00 5.428571 8.00000 5.066667 8.000000 10.85714 20.00000
median release 1.576191 2.166667 98.66667 200.0000 2.000000 3.000000 5355.000 5543.00 4.800000 7.00000 4.655556 7.000000 10.00000 18.00000

Training Covariates

  • Experiment 3:
    • daily_num_sessions_started
    • daily_num_sessions_started_max
    • FX_PAGE_LOAD_MS_2_PARENT
    • memory_mb
    • num_active_days
    • num_addons
    • num_bookmarks
    • profile_age
    • session_length
    • session_length_max
    • TIME_TO_DOM_COMPLETE_MS
    • TIME_TO_DOM_CONTENT_LOADED_END_MS
    • TIME_TO_DOM_INTERACTIVE_MS
    • TIME_TO_LOAD_EVENT_END_MS
    • TIME_TO_NON_BLANK_PAINT_MS

Mean

daily_num_sessions_started daily_num_sessions_started_max FX_PAGE_LOAD_MS_2_PARENT memory_mb num_active_days num_addons num_bookmarks profile_age session_length session_length_max TIME_TO_DOM_COMPLETE_MS TIME_TO_DOM_CONTENT_LOADED_END_MS TIME_TO_DOM_INTERACTIVE_MS TIME_TO_LOAD_EVENT_END_MS TIME_TO_NON_BLANK_PAINT_MS
pre-matching 0.1553457 0.2030987 0.2416733 0.0975787 0.1408608 0.2104416 0.4273128 0.0095897 0.2667912 0.1927862 0.5111225 0.3487846 0.4912587 0.5404025 0.3952939
post-matching 0.2620936 0.2583906 0.0872071 0.0036564 0.0571241 0.2435996 0.5854632 0.0312860 0.4729580 0.3982918 0.2255813 0.1648655 0.2541375 0.2456803 0.2059763

Median

daily_num_sessions_started daily_num_sessions_started_max FX_PAGE_LOAD_MS_2_PARENT memory_mb num_active_days num_addons num_bookmarks profile_age session_length session_length_max TIME_TO_DOM_COMPLETE_MS TIME_TO_DOM_CONTENT_LOADED_END_MS TIME_TO_DOM_INTERACTIVE_MS TIME_TO_LOAD_EVENT_END_MS TIME_TO_NON_BLANK_PAINT_MS
pre-matching 0.1666667 0.25 0.2104484 0.0123870 0.1666667 0.2 0.1153846 0.0237389 0.0517359 0.0091046 0.3024139 0.2601641 0.2964177 0.3124521 0.2423833
post-matching 0.2777778 0.25 0.0670602 0.0003716 0.0000000 0.2 0.0588235 0.0584577 0.3153878 0.3814794 0.1226249 0.1106528 0.1435491 0.1357234 0.1132518

metric label daily_num_sessions_started daily_num_sessions_started_max FX_PAGE_LOAD_MS_2_PARENT memory_mb num_active_days num_addons num_bookmarks profile_age session_length session_length_max TIME_TO_DOM_COMPLETE_MS TIME_TO_DOM_CONTENT_LOADED_END_MS TIME_TO_DOM_INTERACTIVE_MS TIME_TO_LOAD_EVENT_END_MS TIME_TO_NON_BLANK_PAINT_MS
mean beta 3.398131 5.141417 3593.767 8796.994 5.912574 7.882667 226.4153 876.2575 13.336721 23.27297 4597.715 2897.937 2627.875 4376.866 2015.849
mean beta - matched 3.374964 5.406726 3149.558 9535.566 6.835812 8.212453 273.6836 1018.2650 13.851032 24.89952 3711.780 2515.148 2157.706 3492.088 1698.813
mean release 3.839186 6.196901 2894.488 9748.104 6.718018 6.686080 158.9298 884.7323 10.738558 19.67306 3042.921 2148.813 1762.515 2841.729 1445.032
median beta 2.666667 4.000000 3045.818 7974.000 6.000000 7.000000 24.0000 691.0000 8.154375 13.96139 3025.414 2002.342 1765.381 2857.994 1359.857
median beta - matched 2.625000 4.000000 2711.802 8071.000 7.000000 7.000000 37.0000 852.0000 9.323482 17.61306 2636.429 1794.666 1557.169 2485.726 1218.062
median release 3.000000 5.000000 2516.446 8074.000 7.000000 6.000000 27.0000 675.0000 7.802444 13.84445 2323.160 1589.160 1361.967 2177.837 1094.750